Automated Detection of Non-Relevant Posts on the Russian Imageboard "2ch": Importance of the Choice of Word Representations

نویسندگان

  • Amir Bakarov
  • Olga Gureenkova
چکیده

This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the supervised classifier with a composed word embeddings of two posts. Considering that the success in this task could be quite sensitive to the choice of word representations, we propose a comparison of the performance of different word embedding models. We train 7 models (Word2Vec, Glove, Word2Vec-f, Wang2Vec, AdaGram, FastText, Swivel), evaluate embeddings produced by them on dataset of human judgements and compare their performance on the task of nonrelevant posts detection. To make the comparison, we propose a dataset of semantic relatedness with posts from one of the most popular Russian Web forums, imageboard “2ch”, which has challenging lexical and grammatical features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mental Representations of Lyrical Prose

The article analyzes mental representations of Russian lyrical prose texts. The texts demonstrate collective memory engrams that are defined by cultural and historical legacy of the nation and authors’ creative world perception. In architectonics of a lyrical prose text, sense perception reveals itself in accumulated underlying meanings and wisdom conveyed by expressive means. The author’s inte...

متن کامل

On the Role of Derivational Processes in the Formation of Non-Taxonomic Classes of Lexical Units in Russian

The paper is focused on classes of lexical units which arise as a result of derivational processes – word formation and semantic transfers, acting either in isolation or together, on the basis of common semantic foundations that bind targets and sources of derivation. The lexical items which constitute the classes under study vary in their denotative characteristics and due to their categ...

متن کامل

Gender Concept “Woman” in the Minds of the Russian People (Taking the Chinese as Reference) According to an Associative Experiment

The article is devoted to the study of language representations of the concept of “woman” in the minds of the Russian and Chinese people based on a comparison of associative experiments of two languages, identifying the dynamics of the concept in the language consciousness of the people, establishing the specificity of the concept in the Russian language picture of the world referring to the Ch...

متن کامل

Using functional magnetic resonance imaging (fMRI) to explore brain function: cortical representations of language critical areas

Pre-operative determination of the dominant hemisphere for speech and speech associated sensory and motor regions has been of great interest for the neurological surgeons. This dilemma has been of at most importance, but difficult to achieve, requiring either invasive (Wada test) or non-invasive methods (Brain Mapping). In the present study we have employed functional Magnetic Resonance Imaging...

متن کامل

Using functional magnetic resonance imaging (fMRI) to explore brain function: cortical representations of language critical areas

Pre-operative determination of the dominant hemisphere for speech and speech associated sensory and motor regions has been of great interest for the neurological surgeons. This dilemma has been of at most importance, but difficult to achieve, requiring either invasive (Wada test) or non-invasive methods (Brain Mapping). In the present study we have employed functional Magnetic Resonance Imaging...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017